Unit 2 - Confidence Intervals

Ma 340 Applied Statistics

Author

Charles Lacey

Data Import: Nursing Salary Data

Employee Department DeptScale Status StatusNom StressLevel HighestDegreeEarned YearsofExperienceasof2016 2012HourlyWage 2013HourlyWage 2014HourlyWage 2015HourlyWage 2016HourlyWage
31952 Staff Dev 0 FT 1 0 Bachelor 9 25.54 25.56 25.72 26.80 27.62
33540 Staff Dev 0 FT 1 0 Master 13 27.71 28.31 29.37 29.85 30.05
38274 Staff Dev 0 FT 1 0 Diploma 11 24.14 24.58 24.74 26.14 27.06
32490 Staff Dev 0 FT 1 0 Associate 10 25.09 25.11 25.17 26.37 27.53
34803 Staff Dev 0 FT 1 0 Bachelor 10 25.84 26.16 27.08 27.34 29.02
32915 Staff Dev 0 FT 1 0 Associate 27 29.08 29.14 29.30 29.70 30.62
37771 Staff Dev 0 FT 1 0 Associate 16 26.08 26.44 26.44 26.76 27.10
35608 Staff Dev 0 FT 1 0 Associate 7 23.92 24.60 24.86 25.10 25.34
37052 Inf Cont 0 FT 1 0 Bachelor 15 26.67 26.96 26.97 27.07 27.08
38169 Inf Cont 0 FT 1 0 Associate 26 28.56 28.59 29.28 29.77 30.63

Estimating Population Parameters

  • Point Estimate: The sample statistic that is the best point estimate (or single value estimate) of the population parameter.
  • Confidence Interval: An interval estimate of the true value of a population parameter. Abbreviated CI.
  • Confidence Level: The probability 1-\(\alpha\) that the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times.
  • Margin of Error: The maximum expected difference between a sample statistic and the actual population parameter at a given confidence level.

\[Confidence \ Interval = Point\ Estimate \pm Margin\ of\ Error = (Point\ Estimate−MoE,Point\ Estimate+MoE)\]

Proportions

Point Estimate

  • What is the best point estimate for the population proportion?
# Sample Proportion
pe = length(which(NursesWages$HighestDegreeEarned == "Master"))/nrow(NursesWages)

The best point estimate for the proportion of all nurses with Master degrees in 2016 is the sample proportion, \(\hat{p}\) = 0.105, from the 286 nurses randomly sampled.

Margin of Error

  • What is the appropriate Margin of Error for \(\hat{p}\)?
  • What is the most likely spread of \(\hat{p}\)’s relative to \(p\)?
Recall - Normal Approximation of the Binomial

Consider the distribution of \(\hat{p}\):

\[ \hat{p} \sim N\!\left(p,\; \frac{p(1-p)}{n}\right)\]
For \(p=0.6\),

More generally, for any \(p\), \[Z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} \sim N(0,1)\]

Since, \[ -z_{\alpha/2} < Z < z_{\alpha/2} \]
\[ -z_{\alpha/2} < \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} < z_{\alpha/2} \]
\[ -z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} < \hat{p} - p < z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} \]
\[ -\hat{p}-z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} < - p < -\hat{p} + z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} \]
\[ \hat{p}-z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} < p < \hat{p} + z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} \]

Thus, \[ MoE = z_{\alpha/2}*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

Confidence Interval

\[ \hat{p}-z_{\alpha/2}*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} < p < \hat{p} + z_{\alpha/2}*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
\[ \hat{p}-MoE < p < \hat{p} + MoE \]

Example:

Check Assumptions:

nrow(NursesWages)*pe 
[1] 30
nrow(NursesWages)*(1-pe) 
[1] 256

Margin of Error

for our sample of 286 nurses is approximaterly,

alpha = .05
MoE = -qnorm(alpha/2)*sqrt(pe*(1-pe)/nrow(NursesWages))
MoE
[1] 0.03551237

Confidence Interval

UB = pe + MoE
LB = pe - MoE

With 95% confidence, the actual percentage of nurses with master degrees is between 6.94% and 14.04% in 2016.

Base R Tools

x = length(which(NursesWages$HighestDegreeEarned == "Master"))
n = nrow(NursesWages)

output = binom.test(x, n, conf.level = 0.95)
output$conf.int
[1] 0.07189903 0.14635036
attr(,"conf.level")
[1] 0.95

Means

Point Estimate

What is the best point estimate for the population mean?

# Sample Mean
pe = mean(NursesWages$`2016HourlyWage`)

The best point estimate for all hourly wages of nurses in 2016 is the sample mean of $27.32 from the 286 nurses randomly sampled.

Margin of Error

  • What is the appropriate Margin of Error for \(\bar{x}\)?
  • What is the most likely spread of \(\bar{x}\)’s relative to \(\mu\)?

\[Z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}} \sim N(0,1)\] \[\text{Margin of Error} = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]

What is \(\sigma\)?

Student t distribution

\[t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}; \ \ \ df = n-1\] So, \[\text{Margin of Error} = t_{\alpha/2,\, n-1} \cdot \frac{s}{\sqrt{n}}\]

Confidence Interval

\[ \bar{x}-MoE < \mu < \bar{x} + MoE \]

Example:

Check Assumptions:

nrow(NursesWages) > 30
[1] TRUE

Margin of Error

For our sample of 286 nurses, Sigma is unknown.

alpha = .05
MoE = -qt(alpha/2, nrow(NursesWages)-1) * sd(NursesWages$`2016HourlyWage`) / sqrt(nrow(NursesWages))
MoE
[1] 0.2119204

Confidence Interval

UB = pe + MoE
LB = pe - MoE

With 95% confidence, the actual average hourly salary of nurses is between $27.1 and $27.53 in 2016.

Base R Tools

?t.test
output = t.test(NursesWages$`2016HourlyWage`, conf.level = 0.95)
output$conf.int
[1] 27.10441 27.52825
attr(,"conf.level")
[1] 0.95